Acoustic Modeling for Google Home

نویسندگان

  • Bo Li
  • Tara N. Sainath
  • Arun Narayanan
  • Joe Caroselli
  • Michiel Bacchiani
  • Ananya Misra
  • Izhak Shafran
  • Hasim Sak
  • Golan Pundak
  • Kean K. Chin
  • Khe Chai Sim
  • Ron J. Weiss
  • Kevin W. Wilson
  • Ehsan Variani
  • Chanwoo Kim
  • Olivier Siohan
  • Mitch Weintraub
  • Erik McDermott
  • Richard Rose
  • Matt Shannon
چکیده

This paper describes the technical and system building advances made to the Google Home multichannel speech recognition system, which was launched in November 2016. Technical advances include an adaptive dereverberation frontend, the use of neural network models that do multichannel processing jointly with acoustic modeling, and Grid-LSTMs to model frequency variations. On the system level, improvements include adapting the model using Google Home specific data. We present results on a variety of multichannel sets. The combination of technical and system advances result in a reduction of WER of 8-28% relative compared to the current production system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home

We describe the structure and application of an acoustic room simulator to generate large-scale simulated data for training deep neural networks for far-field speech recognition. The system simulates millions of different room dimensions, a wide distribution of reverberation time and signal-to-noise ratios, and a range of microphone and sound source locations. We start with a relatively clean t...

متن کامل

Robust Speech Recognition via Anchor Word Representations

A challenge for speech recognition for voice-controlled household devices, like the Amazon Echo or Google Home, is robustness against interfering background speech. Formulated as a far-field speech recognition problem, another person or media device in proximity can produce background speech that can interfere with the device-directed speech. We expand on our previous work on device-directed sp...

متن کامل

A High Order Approximation of the Two Dimensional Acoustic Wave Equation with Discontinuous Coefficients

This paper concerns with the modeling and construction of a fifth order method for two dimensional acoustic wave equation in heterogenous media. The method is based on a standard discretization of the problem on smooth regions and a nonstandard method for nonsmooth regions. The construction of the nonstandard method is based on the special treatment of the interface using suitable jump conditio...

متن کامل

Memory-Efficient Modeling and Search Techniques for Hardware ASR Decoders

This paper gives an overview of acoustic modeling and search techniques for low-power embedded ASR decoders. Our design decisions prioritize memory bandwidth, which is the main driver in system power consumption. We evaluate three acoustic modeling approaches–Gaussian mixture model (GMM), subspace GMM (SGMM) and deep neural network (DNN)–and identify tradeoffs between memory bandwidth and recog...

متن کامل

Unsupervised discovery and training of maximally dissimilar cluster models

One of the difficult problems of acoustic modeling for Automatic Speech Recognition (ASR) is how to adequately model the wide variety of acoustic conditions which may be present in the data. The problem is especially acute for tasks such as Google Search by Voice, where the amount of speech available per transaction is small, and adaptation techniques start showing their limitations. As trainin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017